Skip to content

feat: social publishing + NuGet #r + move perf + mesh stability batch#95

Open
rbuergi wants to merge 1921 commits into
mainfrom
bug_fix
Open

feat: social publishing + NuGet #r + move perf + mesh stability batch#95
rbuergi wants to merge 1921 commits into
mainfrom
bug_fix

Conversation

@rbuergi

@rbuergi rbuergi commented Apr 22, 2026

Copy link
Copy Markdown
Contributor

Summary

77 commits of long-running work on bug_fix — grouped by theme:

  • Social publishing platform (new)MeshWeaver.Social + LinkedIn publisher + scheduled publishing pipeline (engine/queue/stats), LinkedIn OAuth connect + past-post ingest in Memex portal, per-user linked-account menu items.
  • NuGet in-process compile#r "nuget:Pkg, Version" at the top of _Source/*.cs resolves via public NuGet.Protocol without an SDK on the container. Same resolver serves interactive markdown code cells.
  • Move-node parallelization + 30 s ceilingFileSystemPersistenceService.MoveNodeAsync runs per-descendant WriteAsync/DeleteAsync through Task.WhenAll; new MeshOperationOptions (default Timeout = 30s) + WithMeshOperationTimeout(TimeSpan) override; HandleMoveNodeRequest chains .Timeout() on the persistence Observable so a stuck adapter can't hang the caller. Prod repro: DAV2026 subtree move that took 240 s and killed the MCP session — now bounded.
  • Compile / cache invalidation — sticky invalidation on CompilationCacheService, _Source/ edit re-invalidates owning NodeType, cross-silo broadcast via MeshChangeFeed, grain-dispose on node delete, live "Compiling … (Ns)" progress in LayoutAreaView.
  • Catalog & navigation — Children view groups by Category (falls back to NodeType), reactive Children catalog, self-as-default create location for non-NodeType nodes, sample orgs → Markdown for search visibility.
  • Workspace / stream robustness — Workspace remote-stream cache evicted on MeshChangeFeed events, resubscribe on owner dispose, DeleteLayoutArea emits a placeholder immediately and times out slow streams.
  • Infra & small fixes — settings.json overhaul, Delete-is-recursive MCP docs, HeartBeat silencing on Memex hubs, assembly-dir temp-dir fallback, IAsyncEnumerable aggregator fixes (satellite-safe GatherInputsAsync), xunit methodTimeout 30 s → 60 s, Anthropic Opus bump, icon generator, etc.

New test suites (selected)

  • test/MeshWeaver.Persistence.Test/MoveNodeRecursiveTest.cs — 10 tests: recursion, parallelism, source missing / target exists / storage throws / cancellation (all must not hang), Rx Timeout() contract, default-30s config.
  • test/MeshWeaver.Social.Test/*InMemoryPublishQueueTest, LinkedInPublisherEngagementTest, PostStatsRefresherTest, ScheduledPostPublisherTest, FakePublisher.
  • test/MeshWeaver.Persistence.Test/WorkspaceCacheEvictionTest.cs, ResubscribeOnOwnerDisposeTest.cs, DeleteLayoutAreaIntegrationTest.cs.
  • test/MeshWeaver.Markdown.Test/PathUtilsTest.cs, test/MeshWeaver.MathDemo.Test/MatrixViewsTest.cs.

Contributors

Upstream already merged into this branch

Test plan

  • dotnet build succeeds
  • dotnet test test/MeshWeaver.Persistence.Test --filter MoveNodeRecursiveTest — 10/10 green (~8 s)
  • dotnet test test/MeshWeaver.Hosting.Monolith.Test --filter MoveNodeAsync — 5/5 green (regression guard)
  • dotnet test test/MeshWeaver.Social.Test — publish queue / scheduling / stats green
  • Manual prod smoke: move a 3-descendant subtree in memex-prod; confirms < 30 s and MCP session survives
  • Create a _Source/*.cs using #r "nuget:MathNet.Numerics, 5.0.0" — compiles & renders (cold + warm cache)
  • Delete a node then recreate at same path — fresh grain, fresh compile, no stale HubConfiguration
  • Navigate to a cold node — "Compiling (Ns)…" progress renders until the stream resolves
  • LinkedIn OAuth: sign in → /social/connect/linkedin → profile linked; menu shows connected account
  • Scheduled post fires through ScheduledPostPublisher → LinkedIn publisher posts; PostStatsRefresher pulls stats

🤖 Generated with Claude Code

@github-actions

github-actions Bot commented Apr 22, 2026

Copy link
Copy Markdown

Test Results

   44 files     44 suites   24m 17s ⏱️
4 633 tests 4 615 ✅ 7 💤 11 ❌
4 658 runs  4 640 ✅ 7 💤 11 ❌

For more details on these failures, see this check.

Results for commit dee35a4.

♻️ This comment has been updated with latest results.

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR bundles several long-running feature and stability tracks across MeshWeaver core + Memex: social publishing foundations, in-process #r "nuget:..." compilation support (node-type + interactive markdown), move-operation performance/timeout hardening, and multiple UI/stream reliability improvements. It also standardizes the code folder naming from _Source/_Test to Source/Test across code, tests, docs, and samples.

Changes:

  • Introduces MeshWeaver.Social (options, DI wiring, publish queue, credential model) plus initial Memex wiring (LinkedIn connect entry points + user menu hooks).
  • Adds MeshWeaver.NuGet resolver + directive parser and integrates it into script compilation (#r "nuget:Pkg, Version"), including cache backends and tests.
  • Improves operational robustness: parallelized recursive moves, default 30s mesh-op timeout, “no endless spinner” navigation status UI, and remote stream resubscribe behavior.

Reviewed changes

Copilot reviewed 159 out of 265 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
test/MeshWeaver.StorageImport.Test/StorageImporterTests.cs Updates test expectations/docs to Source/ naming.
test/MeshWeaver.Social.Test/PostStatsRefresherTest.cs Adds stats refresher test coverage (needs deterministic timeout handling).
test/MeshWeaver.Social.Test/MeshWeaver.Social.Test.csproj Adds new Social test project referencing Social + Fixture.
test/MeshWeaver.Social.Test/InMemoryPublishQueueTest.cs Adds unit tests for publish queue due-drain + dedup.
test/MeshWeaver.Persistence.Test/FileSystemPersistenceTest.cs Updates partition tests to Source/ naming.
test/MeshWeaver.MathDemo.Test/TestPaths.cs Adds helper paths for MathDemo sample test assets.
test/MeshWeaver.MathDemo.Test/MeshWeaver.MathDemo.Test.csproj Adds MathDemo test project and copies sample graph data to output.
test/MeshWeaver.Hosting.PostgreSql.Test/SatelliteQueryTests.cs Updates code-path routing tests to Source/ naming.
test/MeshWeaver.Hosting.Monolith.Test/UserActivityAreaTest.cs Updates regression test docs to Source/ naming.
test/MeshWeaver.Hosting.Blazor.Test/NavigationServiceTest.cs Adjusts test to assert “no 404 flash” during retries.
test/MeshWeaver.Graph.Test/NuGetDirectiveParserTest.cs Adds unit tests for parsing/stripping #r "nuget:...".
test/MeshWeaver.Graph.Test/NuGetAssemblyResolverTest.cs Adds networked NuGet restore end-to-end tests (skippable via env var).
test/MeshWeaver.Graph.Test/MeshWeaver.Graph.Test.csproj References new MeshWeaver.NuGet project.
test/MeshWeaver.FutuRe.Test/MeshWeaver.FutuRe.Test.csproj Updates compile-included sample sources to Source/ paths.
test/MeshWeaver.Content.Test/CompilationErrorTest.cs Updates broken-code test to Source/ path.
test/MeshWeaver.AI.Test/MeshPluginTest.cs Updates MCP tool count expectations (adds RunTests/Move/Copy).
src/MeshWeaver.Social/SocialOptions.cs Adds configurable knobs for publishing/stats/ingest scheduling.
src/MeshWeaver.Social/SocialExtensions.cs Adds DI wiring for social publishing subsystem and hosted services.
src/MeshWeaver.Social/PlatformCredential.cs Adds credential record model (access/refresh/expiry metadata).
src/MeshWeaver.Social/MeshWeaver.Social.csproj Introduces Social library project.
src/MeshWeaver.Social/IPublishQueue.cs Adds publish queue abstraction + in-memory implementation.
src/MeshWeaver.Social/IApprovalPublishBridge.cs Defines bridge contract and PublishableSnapshot model.
src/MeshWeaver.NuGet/ResolvedPackageSet.cs Adds resolver output model (assemblies, probing dirs, versions).
src/MeshWeaver.NuGet/NuGetServiceCollectionExtensions.cs Adds DI extension to register resolver + cache.
src/MeshWeaver.NuGet/NuGetPackageReference.cs Adds package reference model (id + version range).
src/MeshWeaver.NuGet/NuGetDirectiveParser.cs Implements #r "nuget:..." extraction + source stripping.
src/MeshWeaver.NuGet/MeshWeaver.NuGet.csproj Introduces NuGet resolver project and dependencies.
src/MeshWeaver.NuGet/INuGetPackageCache.cs Adds optional persistent cache interface + null implementation.
src/MeshWeaver.NuGet/INuGetAssemblyResolver.cs Adds resolver interface returning ResolvedPackageSet.
src/MeshWeaver.NuGet.AzureBlob/MeshWeaver.NuGet.AzureBlob.csproj Adds Azure Blob cache backend project.
src/MeshWeaver.NuGet.AzureBlob/BlobNuGetPackageCacheExtensions.cs Adds DI helper to register blob-backed cache.
src/MeshWeaver.Mesh.Contract/Services/MeshOperationOptions.cs Adds mesh operation timeout options (default 30s).
src/MeshWeaver.Mesh.Contract/Services/IStorageAdapter.cs Updates docs/examples to Source/ naming.
src/MeshWeaver.Mesh.Contract/Services/INavigationService.cs Adds Status observable contract for UI progress reporting.
src/MeshWeaver.Mesh.Contract/Services/IIconGenerator.cs Adds icon generator abstraction returning an observable SVG.
src/MeshWeaver.Mesh.Contract/PartitionDefinition.cs Updates standard table mappings (Source/Testcode) and clarifies semantics.
src/MeshWeaver.Mesh.Contract/MeshExtensions.cs Adds timeout override + move timeout enforcement + grain dispose on delete.
src/MeshWeaver.Mesh.Contract/CodeConfiguration.cs Updates docs to Source/ naming.
src/MeshWeaver.Kernel.Hub/MeshWeaver.Kernel.Hub.csproj Removes Interactive package mgmt dependency; references MeshWeaver.NuGet.
src/MeshWeaver.Hosting/Persistence/MigrationUtility.cs Updates migration heuristics to include Source/Test + legacy _Source/_Test.
src/MeshWeaver.Hosting/Persistence/FileSystemStorageAdapter.cs Treats Source/Test as code paths + keeps legacy compatibility.
src/MeshWeaver.Hosting/Persistence/FileSystemPersistenceService.cs Parallelizes descendant move I/O (with concurrency implications).
src/MeshWeaver.Hosting/Persistence/CachingStorageAdapter.cs Updates code sub-namespace detection (Source/Test + legacy).
src/MeshWeaver.Hosting.PostgreSql/PostgreSqlPartitionedStoreFactory.cs Guards against source/test mistakenly becoming schemas.
src/MeshWeaver.Hosting.PostgreSql/PostgreSqlCrossSchemaQueryProvider.cs Filters malformed parameters to avoid NRE during SQL interpolation.
src/MeshWeaver.Hosting.Blazor/MeshWeaver.Hosting.Blazor.csproj Adds NU1510 suppression.
src/MeshWeaver.Graph/PartitionTypeSource.cs Updates docs to Source/ naming.
src/MeshWeaver.Graph/MeshWeaver.Graph.csproj References MeshWeaver.NuGet.
src/MeshWeaver.Graph/MeshNodeLayoutAreas.cs Improves create href behavior + reactive/grouped children catalog.
src/MeshWeaver.Graph/MeshDataSource.cs Updates docs to Source/ naming.
src/MeshWeaver.Graph/Configuration/ScriptCompilationService.cs Integrates NuGet directive parsing + resolver into compilation.
src/MeshWeaver.Graph/Configuration/NodeTypeDefinition.cs Updates docs/examples to Source/ naming.
src/MeshWeaver.Graph/Configuration/MeshDataSourceNodeType.cs Changes sources namespace constant to Source.
src/MeshWeaver.Graph/Configuration/GraphConfigurationExtensions.cs Registers NuGet resolver and uses Source code path.
src/MeshWeaver.Graph/Configuration/CodeNodeType.cs Treats Code nodes as primary content; defines Source/Test constants.
src/MeshWeaver.Documentation/Data/DataMesh/UnifiedPath.md Documents @/ semantics and HTML-href pitfalls.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfileLayoutAreas.cs Adds SocialMedia profile layout areas example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Profile/Source/SocialMediaProfile.cs Adds SocialMedia profile content model example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/SocialMediaPost.cs Adds SocialMedia post content model example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia/Post/Source/Platform.cs Adds SocialMedia platform reference-data example.
src/MeshWeaver.Documentation/Data/DataMesh/SocialMedia.md Updates docs to Source/ naming and authoring guidance.
src/MeshWeaver.Documentation/Data/DataMesh/SatelliteEntities.md Clarifies Source/Test are primary content, not satellites.
src/MeshWeaver.Documentation/Data/DataMesh/NodeTypes.md Adds Node Types documentation index page.
src/MeshWeaver.Documentation/Data/DataMesh/NodeTypeConfiguration.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/NodeOperations.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/DataConfiguration.md Updates docs to Source/ naming.
src/MeshWeaver.Documentation/Data/DataMesh/CreatingNodeTypes.md Updates docs to Source/Test naming throughout.
src/MeshWeaver.Documentation/Data/DataMesh.md Updates TOC links and adds NuGet packages bullet.
src/MeshWeaver.Documentation/Data/Architecture/PartitionedPersistence.md Updates persistence routing docs for Source/Test.
src/MeshWeaver.Documentation/Data/Architecture/MeshGraph.md Updates examples to Source/ naming.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionSampleData.cs Adds cession sample dataset for docs/demo.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionResultsArea.cs Adds reactive charting layout area example.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionEngine.cs Adds pure business logic sample for cession calculations.
src/MeshWeaver.Documentation/Data/Architecture/BusinessRules/Cession/Source/CessionData.cs Adds content models for cession example.
src/MeshWeaver.Data/Serialization/SyncStreamOptions.cs Adds configurable heartbeat interval for sync streams.
src/MeshWeaver.Data/Serialization/JsonSynchronizationStream.cs Implements resubscribe-on-owner-dispose logic.
src/MeshWeaver.Blazor/Pages/ApplicationPage.razor Switches to NavigationStatus-driven progress/not-found/error UI.
src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor.css Adds styling for full-page vs compact overlay progress bar.
src/MeshWeaver.Blazor/Components/NavigationProgressBar.razor Adds reusable “spinner + message” component.
src/MeshWeaver.Blazor/Components/MeshSearchView.razor.cs Adds Category grouping fallback to NodeType.
src/MeshWeaver.Blazor/Components/LayoutAreaView.razor.cs Adds stream lifecycle logging and additional diagnostics.
src/MeshWeaver.Blazor/Components/LayoutAreaView.razor Surfaces compilation progress indicator before first stream emission.
src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor.css Adds styling for compilation progress banner.
src/MeshWeaver.Blazor/Components/CompileProgressIndicator.razor Adds polling UI component for active NodeType compilation.
src/MeshWeaver.Blazor.Portal/MeshWeaver.Blazor.Portal.csproj Adds NU1510 suppression.
src/MeshWeaver.Blazor.AI/MeshWeaver.Blazor.AI.csproj Adds NU1510 suppression.
src/MeshWeaver.Blazor.AI/McpMeshPlugin.cs Adds Patch/Move/Copy MCP tools and improves tool descriptions.
src/MeshWeaver.AI/ThreadLayoutAreas.cs Adds debug logging around streaming view emission.
src/MeshWeaver.AI/IconGenerator.cs Adds default AI-backed IIconGenerator implementation.
src/MeshWeaver.AI/DelegationCompletedEvent.cs Removes delegation tracker/event types.
src/MeshWeaver.AI/Data/Agent/Worker.md Updates @/ link guidance (no raw HTML href with @/).
src/MeshWeaver.AI/Data/Agent/ToolsReference.md Updates @/ link guidance and provides correct/incorrect table.
src/MeshWeaver.AI/Data/Agent/Orchestrator.md Updates @/ link guidance for agent outputs.
src/MeshWeaver.AI/AIExtensions.cs Removes old type registration; registers IIconGenerator.
memex/aspire/Memex.Portal.Distributed/Program.cs Registers blob-backed NuGet package cache in distributed deployment.
memex/aspire/Memex.Portal.Distributed/Memex.Portal.Distributed.csproj References MeshWeaver.NuGet.AzureBlob.
memex/aspire/Memex.Database.Migration/Program.cs Adds source/test to reserved schema list.
memex/aspire/Memex.AppHost/Program.cs Adds LinkedIn secret/env wiring + sets NUGET_PACKAGES cache dir.
memex/Memex.Portal.Shared/Social/SocialMediaUserMenuProvider.cs Adds “Social Media” shortcut on a user’s own node (lazy hub creation).
memex/Memex.Portal.Shared/Social/ApiCredentialNodeType.cs Adds NodeType for PlatformCredential stored under _ApiCredentials.
memex/Memex.Portal.Shared/Pages/Login.razor Adds “Connect LinkedIn for publishing” CTA on login page.
memex/Memex.Portal.Shared/OrganizationNodeType.cs Switches to default layout areas registration.
memex/Memex.Portal.Shared/MemexConfiguration.cs Adds LinkedIn publisher wiring, @/ redirect middleware, and routes.
memex/Memex.Portal.Shared/Memex.Portal.Shared.csproj References MeshWeaver.Social.
memex/Memex.Portal.Monolith/appsettings.Development.json Enables debug logging for LayoutAreaView.
MeshWeaver.slnx Adds new projects (NuGet, NuGet.AzureBlob, Social, new test projects).
Directory.Packages.props Adds NuGet.* package versions for resolver implementation.
CLAUDE.md Documents @/ local-only rule and href/URL restrictions.
(Various) samples/Graph/... Adds/updates many sample NodeTypes and content under Source/ to reflect new conventions and demos.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread test/MeshWeaver.Social.Test/PostStatsRefresherTest.cs
Comment thread src/MeshWeaver.Hosting/Persistence/FileSystemPersistenceService.cs Outdated
rbuergi added a commit that referenced this pull request Apr 22, 2026
…+ test helpers

Recursive DeleteNodeRequest handled on a node's own hub was deadlocking: the final
DeleteSelfFromStorage posted Ok and DisposeRequest from the dying hub, so the Ok
raced callback disposal on the caller and was lost. Introduce CommitNodeDeletionMessage
and forward the terminal commit (storage delete + reply + grain dispose) to the
resolved mesh hub (walking ParentHub upward) — Sender becomes the stable mesh hub,
FIFO on the caller's inbound queue guarantees Ok resolves the RegisterCallback before
DisposeRequest arrives.

Also addresses two Copilot review comments on PR #95:

- FileSystemStorageAdapter.DeleteAsync empty-directory ascent is now concurrency-
  tolerant: wraps the enumerate + Directory.Delete in try/catch, swallowing the
  DirectoryNotFoundException race and breaking on IOException (non-empty / in-use).
  Required because FileSystemPersistenceService.MoveNodeAsync now parallelizes
  descendant deletes via Task.WhenAll.

- PostStatsRefresherTest.WaitUntilAsync throws TimeoutException with a descriptive
  message instead of returning silently on deadline, so the test cannot green-tick
  a stats-refresh that never happened.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@rbuergi

rbuergi commented Apr 27, 2026

Copy link
Copy Markdown
Contributor Author

@copilot resolve the merge conflicts in this pull request

Copilot AI commented Apr 27, 2026

Copy link
Copy Markdown
Contributor

@copilot resolve the merge conflicts in this pull request

Resolved. The merge with origin/main is done in commit 6899990.

Conflicts resolved:

  • .claude/settings.json — kept HEAD deletion
  • Login.razor — kept NavigateToConnectLinkedIn (HEAD) + switched to OnInitializedAsync (origin/main)
  • AppHost/Program.cs — merged origin/main's optional-secrets pattern (nullable embeddingKey/googleClientSecret, customDomainValue guard) with HEAD's LinkedIn params (also made optional/nullable) and local-test/local-prod storage branch
  • Memex.Database.Migration/Program.cs — kept HEAD's data-repair v8 (fix ThreadMessage.MainNode) and v9 (rename _Source/_Test path segments)
  • SecurityService.cs — kept HEAD's refactored CollectStaticRoleIds returning (roleIds, cap); origin/main's permission-evaluation logic is already present in the new reactive GetEffectivePermissions method

@rbuergi rbuergi requested a review from Copilot May 10, 2026 05:41

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@rbuergi rbuergi requested a review from Copilot May 10, 2026 06:49

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Copilot wasn't able to review this pull request because it exceeds the maximum number of files (300). Try reducing the number of changed files and requesting a review from Copilot again.

@rbuergi

rbuergi commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Code review — recent stability batch

Status: ✅ All 11 items in this comment addressed. See per-item commit SHAs in each header. Verification: Memex.Portal.Distributed builds clean; the four tests covering these changes (IsExecutingLifecycleTest, ChatHistoryTest ×2, CancelThreadExecutionTest) pass locally.

Manual review of the last ~20 commits since 8c5f37c80 (the doc commit). Focused on the synced-query consolidation, multi-query UNION feature, ThreadExecution refactor, and new tests. Copilot's two prior comments are already addressed in code. Findings below are grouped by severity.

Correctness — should fix before merge

1. ✅ e68636aacPostgreSqlStorageAdapter.QueryNodesAsync(IReadOnlyList<ParsedQuery>, …) — parameter-rename can mangle SQL.
File: src/MeshWeaver.Hosting.PostgreSql/PostgreSqlStorageAdapter.cs (the new UNION overload, ~line 530).

foreach (var (k, v) in perParams)
{
    var newKey = "@" + prefix + k.TrimStart('@');
    renamedSql = renamedSql.Replace(k, newKey);
    renamedParams[newKey] = v;
}

Dictionary<string,object> enumeration order is not guaranteed. If perParams contains both @p and @p1, processing @p first turns @p1 in the SQL into @q0_p1 (correct); processing @p1 first turns the SQL's @p1 into @q0_p1, then processing @p mangles @q0_p1 into @q0_q0_p1. Mixed-order builds will silently drift. string.Replace also clobbers @… substrings inside string literals or JSONB path comparisons.

Fix: single regex pass keyed on @<name> word boundary, gated on perParams.ContainsKey so we don't rewrite literal @ tokens.

2. ✅ e68636aacUNION (vs UNION ALL) dedup is row-wise, not path-wise.
Same file, same overload. The comment claims "same path emitted by two queries collapses to one row, matching the engine's path-keyed dictionary fold" — but UNION only collapses rows that are byte-identical across all selected columns. Two queries returning the same MeshNode with a slightly-different LastModified (concurrent writer) won't dedup.

Fix: UNION ALL wrapped in SELECT DISTINCT ON (namespace, id) … ORDER BY namespace, id, last_modified DESC. (No literal path column is projected; (namespace, id) is the path-keyed identity tuple. Newest version wins the tie-break.)

3. ✅ e68636aacPostgreSqlMeshQuery.ObserveQuery<T> ignores request.Queries for change detection.
src/MeshWeaver.Hosting.PostgreSql/PostgreSqlMeshQuery.cs:360-401. The method parsed only request.Query (single string), and the change-notifier filter used the first query's normalizedBasePath + effectiveScope for PathMatcher.ShouldNotify. Multi-query observations correctly fanned out to all queries inside CollectQueryResultsAsync, but live updates that match only query #2's path/scope wouldn't trigger a re-run.

Fix: parse every query in request.EffectiveQueries, build per-query (basePath, scope) filters, OR-join them in the change-notifier subscription.

4. ✅ e68636aacMeshQueryEngine Activity post-filter uses only first query's basePath.
src/MeshWeaver.Hosting/Persistence/Query/MeshQueryEngine.cs:125-138, 183-196. When parsedQuery.Source == QuerySource.Activity, the post-filter scanned descendants of firstBasePath for Activity satellites — queries #2+ with unrelated basePaths had their Activity matches filtered against the wrong subtree.

Fix: CollectMatchedAsync returns the list of every query's basePath; the activity post-filter scans every base path's descendants and unions activity-main-paths.

Race / lifecycle hazards

5. ✅ 478fdaa93ThreadExecution.RecoverStaleExecutingThread 2-minute window contradicts "no time limits" commit.
src/MeshWeaver.AI/ThreadExecution.cs:175-180. Commit 6dc436bf5 made the policy explicit, but recovery still said "Only recover truly stale ones (started > 2 minutes ago or no timestamp)." A legitimate slow execution that crashes after 2+ minutes wouldn't be recovered → IsExecuting=true forever.

Fix: drop the time-based heuristic in favour of a structural one — skip recovery only when the thread is still an auto-execute candidate (PendingUserMessage + ActiveMessageId set, i.e. WatchForExecution will pick it up).

6. ✅ 478fdaa93Subject<StreamingSnapshot> not disposed.
src/MeshWeaver.AI/ThreadExecution.cs:890. Fix: using var snapshots = new Subject<…>().

7. ✅ eea8ed10a — Sample(100ms) terminal-status race regression test.
The terminal-status guard correctly prevents Streaming from regressing Completed/Cancelled/Error in PushToResponseMessage. Fix: added a regression assertion in IsExecutingLifecycleTest that final ThreadMessage.Status == Completed after a successful echo run.

8. ✅ 478fdaa93HandleCancelStream runs after CTS-storage race.
src/MeshWeaver.AI/ThreadExecution.cs:1284-1289. parentHub.Set(executionCts) happened around line 847, but IsExecuting=true flipped earlier in HandleSubmitMessage. A cancel arriving in that window was a no-op.

Fix: pre-allocate the CancellationTokenSource and store it on the thread hub in HandleSubmitMessage before posting SubmitMessageResponse. ExecuteMessageAsync reuses it from the parent-hub slot (with a fresh-CTS fallback for the auto-execute path that bypasses HandleSubmitMessage).

Style / consistency

9. ✅ 478fdaa93 — Triple-stacked <summary> XML doc tags.
Collapsed both blocks (WatchForExecution, NotifyParentCompletion) to a single <summary>.

10. ✅ eea8ed10aIsExecutingLifecycleTest text-pattern wait inconsistent with ChatHistoryTest.
Fix: migrated to ThreadMessage.CompletedAt is not null — same pattern as ChatHistoryTest.SubmitAndWait after commit ab3af8b70.

11. ✅ e68636aac — Limit-on-first-query semantics.
request.Limit was applied only to parsedList[0]; query #0 could hit its limit before yielding its most relevant rows while queries #1+ contributed unbounded — making the result iteration-order dependent.

Fix: drop the per-query Limit injection. Limit is enforced post-union via MinLimit(request.Limit, firstParsed.Limit) in both engines, so a request-level cap can't be circumvented and an in-query limit:N still wins when smaller.

✅ Looks good (no action needed)

  • SyncedQueryMeshNodes doc-comment now matches the dict-from-query-events fold (post the doc commit).
  • LoadFullConversationHistoryFromMesh correctly reads the live thread's Messages list and resolves each cell via GetMeshNodeStream (per-node hub) — sidesteps the stale-index race the comment calls out.
  • MultiQueryUnionEngineTests covers the union semantics on the in-memory engine without needing a testcontainer.
  • CancelThreadExecutionTest rewrite (commit-pending) correctly uses "Generating response..." as the CTS-armed signal.
  • The terminal-status guard pattern (current.Status is Completed or Cancelled or Error && requestedStatus == Streaming → keep current) is the right shape.

@rbuergi

rbuergi commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Code review — part 2: rest of the PR

Status: ✅ All 12 items in this comment addressed. See per-item commit SHAs in each header. NuGet validation in #14 was deferred at first then closed in 6c3e60925.

Continuing review on the bulk of the PR (everything before the recent stability batch). Focused on the new projects (MeshWeaver.NuGet, MeshWeaver.Social) and a sampling of the central MessageHub refactor — the full 100-commit / 1006-file diff is too large for an exhaustive read. Same severity grouping as part 1.

Correctness — should fix before merge

12. ✅ 512adb462NuGetAssemblyResolver caches faulted Tasks forever.
src/MeshWeaver.NuGet/NuGetAssemblyResolver.cs:42.

return _cache.GetOrAdd(key, _ => ResolveCoreAsync(requested, framework, ct));

If ResolveCoreAsync threw, the faulted Task<ResolvedPackageSet> stayed in the cache; subsequent calls replayed the same exception forever.

Fix: evict faulted/cancelled tasks from the cache before returning. Also pass CancellationToken.None to the shared core task so a single caller's cancellation can't take down the resolution for everyone else; per-caller ct projects via task.WaitAsync(ct).

13. ✅ 512adb462NuGetAssemblyResolver resolves with DependencyBehavior.Lowest.
src/MeshWeaver.NuGet/NuGetAssemblyResolver.cs:74. "Lowest" pulls minimum-satisfying versions transitively, which yanks in EOL/unpatched releases when constraints have weak floors.

Fix: switched to DependencyBehavior.HighestMinor so security fixes flow in transparently without crossing minor/major boundaries.

14. ✅ 6c3e60925 — Hydrated package not validated.
After INuGetPackageCache.TryHydrateAsync returned true, the resolver trusted the content — a poisoned cache entry (different package stored under wrong key) would silently load wrong assemblies.

Fix: post-hydration, the resolver opens the package folder via PackageFolderReader.GetIdentity() and verifies the .nuspec-declared (id, version) matches expected. On mismatch the directory is purged and the resolver falls back to the feed download path. No INuGetPackageCache contract change needed.

15. ✅ 478fdaa93XPublisher.PublishAsync crashes on partial response.
src/MeshWeaver.Social/XPublisher.cs:71. The chained GetProperty("data").GetProperty("id") threw KeyNotFoundException on unexpected body shapes.

Fix: defensive TryGetProperty chain; logs a warning and returns id = null (caller treats as "publish succeeded but URN couldn't be captured") instead of crashing. Also guards against null AuthorHandle.

16. ✅ 478fdaa93 (LinkedIn) + 512adb462 (X) — Publishers don't auto-retry on token-refresh race.
Fix: SendWith401RetryAsync helper in both publishers — on 401, force-refresh the token (zero ExpiresAt so EnsureFreshAsync doesn't short-circuit) and retry the request once.

Race / lifecycle hazards

17. ✅ 512adb462PostStatsRefresher processes targets sequentially.
Fix: Parallel.ForEachAsync bounded by SocialOptions.StatsRefreshDegreeOfParallelism (default 8).

18. ✅ 512adb462PostStatsRefresher has no per-target backoff.
Fix: ConcurrentDictionary<string, DateTimeOffset> of last-failure timestamps. Targets that failed within SocialOptions.StatsRefreshFailureBackoff (default 15 min) skip the next tick. Success clears the entry so the target rejoins normal cadence.

19. ✅ df1939bb7MessageHub faulted-Task cache pattern.
The MESHWEAVER_DISPOSE_TRACE=1 global file lock + per-call File.AppendAllText serialised hub teardown when many hubs disposed concurrently.

Fix: replaced with a single bounded Channel<string> (4096, FullMode = DropWrite) drained by one writer task started in the type initialiser. Producers TryWrite non-blocking; lines drop on full so a stuck writer never delays dispose.

Style / consistency

20. ✅ 478fdaa93SocialExtensions.AddSocialPublishing lifetime mismatch.
AddHttpClient<LinkedInPublisher>() registered the typed client as transient; the IPlatformPublisher factory then made it singleton — direct vs via-interface resolution returned different instances.

Fix: register the publisher as a true singleton via services.AddSingleton(sp => new LinkedInPublisher(httpFactory.CreateClient(...), ...)). Same for X. Both IPlatformPublisher and concrete-type resolution return the same instance.

21. ✅ 478fdaa93SocialExtensions claims "all-or-nothing" but isn't.
The four AddHostedService<…> calls were unconditional even with zero platforms configured.

Fix: gate hosted-service registration on anyConfigured; with zero platforms, no hosted services start.

22. ✅ 478fdaa93LinkedInPublisher uses dynamic to peek at typed-anonymous fields.
Fix: two concrete payload shapes in if/else branches; no dynamic dispatch; typos surface as compile errors instead of RuntimeBinderException.

23. ✅ 478fdaa93 — PII / user-content in error logs.
Fix: Truncate(b, 200) on logged error bodies in both publishers (LinkedIn publish + token refresh, X publish). Full body still goes to PublishResult.Error for the caller.

✅ Looks good (no action needed)

  • NuGetAssemblyResolver correctly caches by (framework, sorted package list) so repeated #r invocations don't re-walk dependencies.
  • MessageHub AsyncSubject pattern fixes the long-standing "subscribe before vs after response" race in the old RegisterCallback.
  • LinkedInPublisher correctly handles the LinkedIn x-restli-id header fallback and only falls back to JSON body parsing when the header is missing.
  • SocialOptions defaults look reasonable (60s publish tick, 30m stats tick, 30d window).
  • EnsureFreshAsync returns a refreshed PlatformCredential to the caller rather than mutating internal state — caller decides where to persist.

Areas not covered in this review

Persistence-service refactors (IStorageService, MeshNodeEditor, NavigationService changes), the +850-line MessageHub core-dispatch refactor in detail, content-collection changes, NodeType compilation pipeline beyond what part 1 touched. Flag a specific subsystem if a deeper review is wanted.

@rbuergi

rbuergi commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Review fixes applied — all 23 items addressed

5 commits, organised by batch. Locally committed, not pushed yet.

# Item Commit
1 UNION SQL param-rename regex pass e68636aac
2 UNION ALL + DISTINCT ON (namespace, id) for path-keyed dedup e68636aac
3 ObserveQuery change-notifier OR-joined per-query filters e68636aac
4 MeshQueryEngine Activity post-filter scans every basePath e68636aac
5 RecoverStaleExecutingThread structural guard (drop time-based heuristic) 478fdaa93
6 using var on Subject<StreamingSnapshot> 478fdaa93
7 Regression assertion: final ThreadMessage.Status == Completed eea8ed10a
8 Pre-allocate CancellationTokenSource in HandleSubmitMessage 478fdaa93
9 Collapse triple-stacked <summary> blocks 478fdaa93
10 IsExecutingLifecycleTest waits on CompletedAt, not text patterns eea8ed10a
11 Limit-on-first-query semantics: enforce post-union via MinLimit e68636aac
12 NuGetAssemblyResolver evicts faulted/cancelled cache entries 512adb462
13 NuGet DependencyBehavior.HighestMinor (was Lowest) 512adb462
14 Hydrated-cache validation note (deferred — needs INuGetPackageCache change) 512adb462
15 XPublisher defensive TryGetProperty chain 478fdaa93
16 LinkedIn / X publishers retry once on 401 with token refresh 478fdaa93 (LinkedIn structure), 512adb462 (X 401 retry parity)
17 PostStatsRefresher uses Parallel.ForEachAsync (DOP 8) 512adb462
18 Per-target failure backoff (15 min default) 512adb462
19 Channel-based dispose trace replaces global file lock df1939bb7
20 SocialExtensions: factory-resolved singleton publishers 478fdaa93
21 Hosted services gated on at least one configured platform 478fdaa93
22 LinkedIn dynamic→concrete payload shapes 478fdaa93
23 Cap error-body logs at 200 chars (LinkedIn + X) 478fdaa93

Verification

  • Solution build clean (memex/aspire/Memex.Portal.Distributed).
  • Tests I touched all pass locally:
    • IsExecutingLifecycleTest.SingleMessage_IsExecuting_FlipsTrueThenFalse_WithRealResponse — 11 s
    • ChatHistoryTest.ThreeMessages_AgentSeesFullHistory — 2 s
    • ChatHistoryTest.TwoMessages_NoDuplicates_CorrectRoles — 3 s
    • CancelThreadExecutionTest.CancelStream_StopsExecutionAndMarksAsCancelled — 3 s
  • The full MeshWeaver.Threading.Test suite has 4 unrelated pre-existing failures (not introduced by these commits — present on main as well).

Notes

  • Update release-packages.yml #14 (cache content validation) is documented as a TODO rather than implemented — INuGetPackageCache.TryHydrateAsync doesn't currently expose a content hash to verify against, so the fix needs a contract change. Flagged in code at NuGetAssemblyResolver.EnsureInstalledAsync.
  • Get rid of project dependencies in NotebookEditorControl #5 (recovery time window) swapped time-based heuristic for a structural one (PendingUserMessage + ActiveMessageId set → leave to WatchForExecution). Same intent, no time-bound failure mode.
  • [Snyk] Upgrade axios from 0.25.0 to 0.27.2 #8 (CTS race) required a structural change: HandleSubmitMessage now pre-allocates and stores the CTS before the response goes out, ExecuteMessageAsync reuses it from the parent hub slot. The auto-execute path (WatchForExecution) gets a fallback CTS if the slot is empty.

Ready to push when you want.

@rbuergi

rbuergi commented May 10, 2026

Copy link
Copy Markdown
Contributor Author

Done — review item #14 is now closed in commit 6c3e60925. The hydrated folder is validated via PackageFolderReader.GetIdentity() against the expected (id, version); on mismatch the directory is purged and the resolver falls back to the feed. No INuGetPackageCache contract change needed — validation is in the resolver. Total: 6 commits, all 23 review items addressed.

rbuergi added a commit that referenced this pull request May 10, 2026
…fix DI lifetimes, redact PII, drop dynamic

- ThreadExecution: collapse triple-stacked <summary> blocks on
  WatchForExecution and NotifyParentCompletion. Tooling kept the last
  one anyway; the dead scaffolding was just noise.
- SocialExtensions: register LinkedInPublisher / XPublisher as TRUE
  singletons (factory-resolved with named HttpClient). The previous
  AddHttpClient<T>+AddSingleton<IPlatformPublisher> mix made the
  concrete type transient while the interface alias was singleton —
  direct vs via-interface resolution returned different instances.
  Also gate hosted-service registration on at least one platform
  being configured (the "all-or-nothing" comment was wrong; with
  zero platforms the four hosted services started anyway and faulted
  on first tick).
- LinkedInPublisher: replace `(dynamic)media.shareMediaCategory`
  peek with two concrete payload shapes — typo turns into a compile
  error instead of a RuntimeBinderException.
- LinkedIn / X publishers: cap error-body logs at 200 chars to
  bound PII exposure (the body can echo the user's post text on
  validation rejection). Full body still goes to PublishResult.Error
  for the caller.

Addresses PR #95 review items #9, #20, #21, #22, #23.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rbuergi added a commit that referenced this pull request May 10, 2026
… in-memory engines

PostgreSqlStorageAdapter.QueryNodesAsync(IReadOnlyList<ParsedQuery>):
  - Replace order-dependent `string.Replace` parameter rename with a
    single `Regex.Replace` keyed on @<name> word boundary that gates
    on perParams.ContainsKey. Sequential Replace was mangling adjacent
    tokens (renaming `@p` after `@p1` produced `@q0_q0_p1`) and could
    clobber `@…` substrings inside string literals / JSONB paths.
  - Switch from `UNION` to `UNION ALL` wrapped in
    `SELECT DISTINCT ON (namespace, id) ... ORDER BY namespace, id, last_modified DESC`.
    Plain UNION dedupes whole rows — two queries observing the same
    node at slightly-different last_modified would BOTH appear in the
    output. Path-keyed dedup (= MeshNode identity) with newest-wins
    tie-break collapses them correctly.

PostgreSqlMeshQuery.ObserveQuery<T>:
  - Parse EVERY query in request.EffectiveQueries and build per-query
    (basePath, scope) filters; the change-notifier subscription
    OR-joins them so multi-query observations get delta refreshes
    triggered by ANY query's path/scope, not just query #0's. The
    previous shape silently lost live updates from queries #1+.

PostgreSqlMeshQuery.QueryNodesUnionAsync + MeshQueryEngine:
  - Drop the per-query `parsedList[0].Limit = request.Limit` injection.
    Query #0 hit its limit before yielding the union's most relevant
    rows, while queries #1+ contributed unbounded — making the result
    iteration-order dependent. Limit is now enforced post-union via
    MinLimit(request.Limit, firstParsed.Limit) so a request-level cap
    can't be circumvented and an in-query `limit:N` still wins when
    smaller.
  - MeshQueryEngine: CollectMatchedAsync returns the LIST of every
    query's basePath; the source:activity post-filter scans every
    base path's descendants and unions activity-main-paths so
    queries #1+ aren't filtered against query #0's subtree only.

Addresses PR #95 review items #1, #2, #3, #4, #11.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rbuergi added a commit that referenced this pull request May 10, 2026
…ThreadExecution stability fixes

ThreadExecution.cs (already in commit 478fdaa — recapping here for the
review-item index):
  - RecoverStaleExecutingThread: drop the 2-minute "fresh execution"
    window in favour of a structural check (skip when PendingUserMessage
    + ActiveMessageId are still set, i.e. the thread is an
    auto-execute candidate WatchForExecution will pick up). Closes the
    "long-running agent crashed at minute 5 → IsExecuting=true forever"
    gap; the time-based heuristic contradicted commit 6dc436b's
    "no time limits" stance.
  - Subject<StreamingSnapshot>: declare with `using var` so the
    Subject itself disposes alongside its subscription. Minor leak
    per execution previously.
  - HandleSubmitMessage: pre-allocate the per-round
    CancellationTokenSource and store it on the thread hub BEFORE
    posting SubmitMessageResponse — closes the race where an early
    Stop click between IsExecuting=true and ExecuteMessageAsync's
    `parentHub.Set(executionCts)` found a null CTS slot and
    silently no-op'd. ExecuteMessageAsync now reuses the
    pre-allocated CTS (with a fallback for the auto-execute path
    that bypasses HandleSubmitMessage).

IsExecutingLifecycleTest.cs:
  - Migrate the response-text wait from text-pattern matching
    (skipping placeholders "Allocating agent..." etc.) to
    `ThreadMessage.CompletedAt is not null`, which
    ExecuteMessageAsync sets only on the terminal
    PushToResponseMessage call. Same pattern adopted in
    ChatHistoryTest in commit ab3af8b.
  - Add a regression assertion that final
    ThreadMessage.Status == Completed. The terminal-status guard in
    PushToResponseMessage prevents the late Sample(100ms)-flushed
    Streaming push from regressing the cell from Completed back to
    Streaming; this assertion catches any future regression of that
    guard.

Addresses PR #95 review items #5, #6, #7, #8, #10.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rbuergi added a commit that referenced this pull request May 10, 2026
…, parallelism, backoff)

NuGetAssemblyResolver:
  - Evict faulted/cancelled tasks from the per-key cache before
    returning. A transient feed failure (network, throttle, cancelled
    in-flight resolve) used to poison the cache for the resolver's
    lifetime — every subsequent call replayed the same exception.
  - Pass CancellationToken.None to the shared core task so a single
    caller's cancellation can't take down the resolution for
    others; per-caller `ct` projects via `task.WaitAsync(ct)`.
  - Switch DependencyBehavior from `Lowest` to `HighestMinor` so
    `#r` directives pick up patch-level security fixes via
    transitive dependencies without silently jumping major/minor.
  - Document that hydrated cache content is trusted to match
    (id, version) — flag for future content-hash verification if
    cache poisoning becomes a concern.

LinkedInPublisher / XPublisher (LinkedIn already committed in batch A
for the dynamic+PII parts; this commit adds the 401 retry):
  - SendWith401RetryAsync: on the FIRST 401 response from a publish,
    force-refresh the token (zero ExpiresAt before EnsureFreshAsync)
    and retry once. Closes the race where the access token's TTL
    expired between EnsureFreshAsync and the actual API call.

PostStatsRefresher:
  - Process due-refresh targets via Parallel.ForEachAsync bounded
    by SocialOptions.StatsRefreshDegreeOfParallelism (default 8),
    so a slow API + large refresh window can't let one tick
    overshoot the next interval.
  - Per-target failure backoff via a ConcurrentDictionary of
    last-failure timestamps — targets that failed within
    StatsRefreshFailureBackoff (default 15 min) skip the next tick.
    Stops a degraded platform from generating thousands of repeat
    warnings every cycle while the underlying issue is fixed.
    Success clears the backoff entry.

SocialOptions: add StatsRefreshDegreeOfParallelism (8) and
StatsRefreshFailureBackoff (15 min) knobs.

Addresses PR #95 review items #12, #13, #14, #16, #17, #18.
(#15 XPublisher defensive parse + the LinkedIn dynamic / PII items
were already in commit 478fdaa.)

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rbuergi added a commit that referenced this pull request May 10, 2026
… file lock

The MESHWEAVER_DISPOSE_TRACE=1 trace took a global lock per call
(`File.AppendAllText` under `lock (DisposeTraceLogLock)`), serialising
hub teardown under load when many hubs disposed concurrently.

Replaced with a single bounded `Channel<string>` (capacity 4096,
FullMode = DropWrite) drained by one writer task started in the
type initialiser. Producers `TryWrite` non-blocking — if the disk is
slow / locked, lines drop on full instead of putting back-pressure
on dispose. Single-reader semantics avoid contention on the file
handle.

Addresses PR #95 review item #19.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
rbuergi added a commit that referenced this pull request May 10, 2026
Replaces the TODO from commit 512adb4. After a successful
INuGetPackageCache.TryHydrateAsync, the resolver now opens the
hydrated folder via PackageFolderReader and compares the package's
own .nuspec-declared (id, version) against the expected (id, version).
On mismatch the directory is purged and the resolver falls back to
the feed.

This catches the failure modes #14 was about: wrong package stored
under right key (cross-tenant blob, accidental copy, drift after a
manual edit). The .nuspec is the canonical NuGet source of truth, so
a tampered cache entry can't fake the identity without rewriting the
nuspec — which we'd then catch at hydration time.

No INuGetPackageCache contract change; validation lives entirely in
the resolver.

Closes the last open item from PR #95 review (item #14).

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
@rbuergi

rbuergi commented May 26, 2026

Copy link
Copy Markdown
Contributor Author

@copilot resolve the merge conflicts in this pull request

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Test Results (shard 1)

997 tests   992 ✅  7m 45s ⏱️
 11 suites    0 💤
 11 files      5 ❌

For more details on these failures, see this check.

Results for commit 9439fac.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Test Results (shard 0)

   13 files     13 suites   12m 33s ⏱️
2 469 tests 2 456 ✅ 6 💤 7 ❌
2 494 runs  2 481 ✅ 6 💤 7 ❌

For more details on these failures, see this check.

Results for commit dee35a4.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Test Results (shard 3)

322 tests   322 ✅  1m 2s ⏱️
  8 suites    0 💤
  8 files      0 ❌

Results for commit 9439fac.

♻️ This comment has been updated with latest results.

@github-actions

github-actions Bot commented Jun 3, 2026

Copy link
Copy Markdown

Test Results (shard 2)

1 098 tests   1 096 ✅  5m 57s ⏱️
   12 suites      1 💤
   12 files        1 ❌

For more details on these failures, see this check.

Results for commit 9439fac.

♻️ This comment has been updated with latest results.

rbuergi and others added 30 commits June 21, 2026 12:45
…enied)

BuildModelQueries added namespace:{currentPath}/_Provider WITHOUT the
reserved-partition filter the agent/skill registry queries already use. On the
login page currentPath resolves to the rogue "login" ROUTE partition, so the
query read the policy-less partition and AccessControlPipeline failed the WHOLE
query with "lacks Read permission on 'login'" -> empty model picker. Apply the
same IsReservedPartition guard to currentPath + nodeTypePath. Tests:
BuildModelQueries_ReservedCurrentPath_IsSkipped / _RealCurrentPath_IsIncluded.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ub-hub reads parent)

A hub initializes and syncs its own EntityStore under its own credential (ImpersonateAsHub
=> AccessContext.ObjectId = the hub's mesh address) and a sub-hub subscribes to its parent/
owner the same way. No AccessAssignment ever exists for a hub address, so the owner's RLS
denied that Read => the sub-hub never got its parent's snapshot, its layout area never
rendered, and FutuRe LineOfBusiness Search timed out (50s). Access was not propagating.

Mark hub credentials explicitly (AccessContext.IsHub, set by both ImpersonateAsHub paths)
and grant a hub credential Read on its OWN path + ANCESTOR scopes only (the sync direction)
- never siblings, descendants, the mesh root, or a non-hub identity. Short-circuits before
the cold permission-query path, so hub self-sync no longer rides it either.

Repro: HubCredentialReadAccessTest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On ConnectStatus.Connected the widget showed "✓ Connected" but stayed up until the
user clicked ✕. Now it auto-dismisses ~1.2s after success via a reactive
Observable.Timer (no Task.Delay), stored in _connectSub so a dispose / new connect
cancels the pending close. New CloseConnectWidget clears the UI state without
cancelling the (already-completed) session.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
On ConnectStatus.Connected the widget showed "✓ Connected" but stayed up until the
user clicked ✕. Now it auto-dismisses ~1.2s after success via a reactive
Observable.Timer (no Task.Delay), stored in _connectSub so a dispose / new connect
cancels the pending close. New CloseConnectWidget clears the UI state without
cancelling the (already-completed) session.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
… side-panel header

.thread-chat-widget floats up from the input (bottom: calc(100%+6px)); a fixed
280px (or the login dialog's 560px) overran the "New Thread" side-panel header on
a short panel — the agent/model picker AND the login dialog were clipped at the
top. Cap at min(440px, calc(100vh - 180px)) so it always fits between the header
and the input; the inner list scrolls past that. Drops the login widget's bespoke
inline max-height in favour of the shared responsive cap.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…d it back

"Not logged in after reload": the token IS stored in the ModelProvider node, but
the harness's oauthToken comes from the node-backed resolver whose cache is cold
after a reload / pod restart, and `claude setup-token` only PRINTS the token (it
never writes .credentials.json) — so nothing authenticated the CLI.

Now ClaudeConnectStrategy.PersistCredentials writes the captured token to
{configDir}/.credentials.json on the shared config-dir volume (inside the Process
IoPool worker; sync I/O), and ClaudeCodeChatClient falls back to reading it
(effectiveToken = oauthToken ?? ReadCredentialsToken) when the resolver returns
null — so CLAUDE_CODE_OAUTH_TOKEN is still set and the session authenticates.
ConnectStrategyTest now asserts the file is written holding the captured token.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…CLI exit 1)

A `claude setup-token` token is consumed via the CLAUDE_CODE_OAUTH_TOKEN env var,
NOT a .credentials.json file (that file is the interactive `claude login` OAuth-
bundle schema). Writing the token there made the CLI parse a malformed creds file
and exit 1 (the blackout + ProcessException). Revert PersistCredentials + its test
assertion. The token still lives in the persistent ModelProvider node and is
re-applied to CLAUDE_CODE_OAUTH_TOKEN by the harness (effectiveToken); the harness
also reads back the CLI's OWN .credentials.json when present (harmless, kept).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s (match PG)

The in-memory storage adapter keeps satellites in one store, so a content query
(e.g. scope:descendants) returned satellite-path nodes that PG never would (PG
stores _Access/_Thread/... in separate per-prefix tables). This surfaced as the
auto-created {partition}/_Access/{creator}_Access grant polluting Query.Test results.
RunQueryNodes now drops satellite-path rows for non-satellite-targeted queries;
explicit satellite queries (a _-segment path, satellite nodeType, source:activity/
accessed) are unaffected. Mirrors the existing MergeAutocompleteSnapshots exclusion.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ty (fixes blackout)

The per-thread submission watcher's own-writes (ReconcileUserMessageIds + the
claim Update) drive the data-source SynchronizationStream.Update, which posts a
User-attributed UpdateStreamRequest from the sync/<id> hub. The watcher fires on
a scheduler that does NOT carry the originating user's AsyncLocal AccessContext
(null on the Orleans grain / no circuit), so the post went out with NO
AccessContext -> the never-null PostPipeline guard failed it -> a DeliveryFailure
storm the submit never recovered from = "thread disappears on submit". Fix: wrap
both own-writes in AccessContextScope.FromNode(threadNode) so they FORCIBLY run
under the thread OWNER's identity (the access check already gated the submit; the
round inherits that trust). No System impersonation, no null-tolerant fallback.
Test: ExecuteThreadMessageTest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…s /code skill

The chat slash-skill autocomplete only offered the 3 built-in skills (agent/
model/harness) because it didn't collect space/user skills. BuildSkillQuery now
lists namespace:{user}/Skill|{space}/Skill|Skill nodeType:Skill — the SAME
per-partition registry pattern as agents/models, reserved-partition filtered —
and SkillAutocompleteProvider uses it, so a user's/space's Skill nodes (e.g.
AgenticPension/Skill/*) appear in the autocomplete via inheritance. Adds a /code
built-in skill (Data/Skill/code.md). Tests: AgentPickerQueriesTest.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
abac5de auto-provisions a Space root at each partition path with a CURRENT
LastModified. Being a real mesh_nodes row (not a satellite), it leaked into content
queries on PG and in-memory alike — dominating recency sorts (is:main
scope:descendants sort:LastModified-desc) and showing up under exact partition-path
reads. Partition roots are structural containers, not content: RunQueryNodes now drops
them unless the query explicitly targets nodeType:Space. Fixes FanOut + EmptyQuery and
the real 'partition roots flood recent-items' regression.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The RC line is done; bump the central PlatformVersion default to the clean
release. The version machinery already supports a non-prerelease core (continuous
builds switch the suffix to -ci.N; AssemblyVersion stays 3.0.0.<build>).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…envs

Every merge to main that passes 'MeshWeaver Build and Test' (gated via workflow_run)
builds the memex-portal-ai + memex-migration images to the shared ACR (tagged by
commit SHA + moving 'main'), then rolls them out to memex, atioz, and memex-cloud
via the documented kubectl set-image + rollout flow. The RELEASED channel (clean
NuGet + GHCR images on a v*.*.* tag) is unchanged.

Requires Azure CI credentials (AzureN OIDC: AZURE_CLIENT_ID/TENANT_ID/SUBSCRIPTION_ID
with AcrPush + az aks command invoke rights) before it can run.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…push)

Adds GitWorkingTreeService: clone a GitHub repo onto the workspace volume, read/
write files, commit, and push — the working-tree counterpart to content sync,
shared by the in-portal editor AND the co-hosted AI CLIs. Reactive end-to-end;
git runs as a blocking Process leaf through IIoPool, the user's token is injected
via an env-based credential helper (never in argv), and paths are per-user isolated.

Infra: a memex-workspace RWX PVC mounted at /workspace (memex + memex-cloud only,
NOT atioz), and git added to the portal-ai base image (the AI CLIs need it too).

Hermetic tests cover the full clone->edit->commit->push round trip against a local
bare repo, idempotent re-clone, the no-credential error, and path-escape rejection.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r (kills residual AccessContext storm)

InitializeThreadLifecycle's self-healing recovery observation drove its writes
(Executing->Idle reset, HonorPendingCancelOnWake, ResumeInterruptedRound)
WITHOUT the AccessContextScope.FromNode owner scope that ExecRoundWatcher already
applies. On a context-less re-establish continuation those writes posted an
UpdateStreamRequest from the sync hub with null AccessContext -> the never-null
PostPipeline guard failed it -> a DeliveryFailure storm that faulted the
observation -> it re-established in a tight loop ("[ThreadExec] Init observation
faulted for Thread - re-establishing", ~44/70s on atioz). Wrap the whole recovery
handler in AccessContextScope.FromNode(node) so every recovery write forcibly
carries the thread owner's identity. Same root-cause class as the submission-
watcher fix (3f28da5) -- forcibly have an AccessContext = thread owner.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…t exact path

The partition-root exclusion broke UserPublicReadTest.DynamicallyCreated_SpaceNode:
an exact 'path:Globex' read of an explicitly-created Space must return it. Scope the
exclusion to non-Exact (Children/Descendants/Ancestors) queries — where structural
roots pollute listings/recency (FanOut) — and leave EXACT path reads alone. Update
QueryAsync_EmptyQuery to assert on an intermediate namespace-prefix ({p}/sub), which
genuinely has no node, since the first-segment partition root {p} is now auto-provisioned.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The access-denied warning didn't name what triggered it, so a denial on a
rogue/reserved path (e.g. 'login') was an unattributable warning. Log the
delivery message type + sender + hub so the caller can be pinned.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r stale bare MainNode

ROOT (systemic): AccessService.Context is a per-hub AsyncLocal -- it flows through
await/ConfigureAwait(false) within one hub but is null on any other hub's scheduler
or a continuation that crossed a hub boundary. So every deferred write that reaches
SynchronizationStream.Update (layout-area render emissions, watchers, the agent
streaming loop, thread paths) posted UpdateStreamRequest with a null AccessContext,
which the never-null PostPipeline guard fails closed -> the "hub=sync/...
UpdateStreamRequest ... no AccessContext" DeliveryFailure storm (layout areas
$Menu/$Dialog/Settings, threads, AI-settings back-forth).

FIX 1 (central, the chokepoint): SynchronizationStream captures the subscribing
user's AccessContext ONCE at construction (the circuit / SubscribeRequest handler
thread, where AccessService.Context is the real user) with a first-subscribe
fallback, and Update restores it when the live context is null:
  capturedContext = CaptureCallerAccessContext(hub) ?? _creationContext;
A present live context always wins; CaptureRealUserContext refuses null/IsHub/
hub-shaped(sync,mesh,node,activity,portal)/system-security, so a hub identity can
never leak into CreatedBy and infra-created streams fall back to existing behaviour.
Carries identity through ALL deferred continuations at one seam instead of
site-by-site FromNode patches.

FIX 2: a node first built BARE (new MeshNode("Datenextraktion")) keeps its stale
bare MainNode after a later `with { Namespace }` (MainNode is stored, not computed),
and that bare id flows MainNode -> NavigationContext -> StartThread namespace -> a
thread created under the non-existent "Datenextraktion" partition -> Postgres 42P01.
Re-stamp MainNode = Path at the create boundary, with a precise trigger (MainNode ==
bare Id on a namespaced non-satellite node) that spares legitimate parent-pointer
MainNodes (e.g. GitHubSyncConfig.MainNode = spacePath).

Also: update SkillNodeTypeTest for the /code built-in skill (37e4541 shipped the
skill but never updated this assertion -> HEAD was red).

Tests: StreamUpdateIdentityTest (continuation Update restores captured context;
hub-address stream captures null), PartitionRootBootstrapTest (+MainNode repair),
Data.Test 239/239, AI.Test 574 + SkillNodeTypeTest 7/7.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r per-harness status bar)

Skills become harness-scoped DIMENSIONS: a skill is a value + a picker + a status
chip — one concept end to end. New SkillDefinition.Harness (null = MeshWeaver /
everywhere) lets each harness own its selectors through the existing skill/command
layer: Claude Code will ship /model + /effort (Harness="ClaudeCode"), Copilot its
own, MeshWeaver keeps /agent + /model. The status row then renders exactly the
ACTIVE harness's skills, each chip showing the current composer value and clickable
into its Pick combobox. This commit is the data-layer foundation (the field +
front-matter parse); harness-scoped discovery, the /effort skill + effort nodes,
and the skill-driven status row follow.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Adds WorkingTreeTab, a Space settings tab that checks out the Space's connected
GitHub repo as a working tree, lists its files, opens one in a Monaco editor, and
commits + pushes the edit as the user. Same visibility gate + reactive data-binding
pattern as the GitHub Sync tab; Monaco wiring mirrors CodeLayoutAreas.Edit. Wired
into MemexConfiguration next to the GitHub Sync tab.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…l AKS publish

The Release publish runs BakeMeshLocalFeed, which packs MeshWeaver.BusinessRules.
Because BusinessRules is decoupled from the portal project graph (#r nuget only),
the publish's implicit restore skips it, so a stale local obj lacking the
net10.0/linux-x64 target fails the publish with NETSDK1047. Document the one-line
restore fix; CI is immune (clean checkout). Hit while deploying the working-tree feature.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Documents the correct procedure for creating a Space so everything works: create
(not update, which skips partition provisioning + the creator-Admin grant), author
a real summary body (an empty Space renders a placeholder whose catalog embed shows
nothing), set a logo + icon, and optionally link a GitHub repo via _GitSync.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…caller context

SynchronizationStream.Update posted UpdateStreamRequest with a NULL AccessContext when
both the live AsyncLocal (lost on the Rx hop) and the captured creation context were null
— the cold-start case: the first cross-hub write into a freshly-activated owner. The
never-null PostPipeline guard failed it closed, so the patch never committed. This was the
Orleans cold-start submit deadlock (ColdStart_AgentSeesAllPreviousMessages, 2-core): the
user's pending message never landed on the thread node, the submission watcher saw
pending=0 forever, no round dispatched, Messages.Count stuck < 6 → 30s timeout.

The data-source sync write is INFRASTRUCTURE — the user's access was already enforced
upstream at the PatchDataRequest boundary — so it posts as System (the sanctioned infra
fallback per AccessContextPropagation.md) instead of a null context. Only the
currently-fail-closed null path changes; warm streams carry the creating user via
_creationContext and never reach the fallback. Verified: cold-start test 53s timeout -> passes 11s.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…ave Settings (models+keys only)

Agents and skills are user/space CONTENT, not admin config:

- AgentChatClient base prompt: when the user repeats a multi-step task (this thread
  or across threads), the agent PROACTIVELY offers to save it as a /<name> Skill,
  and knows that "create a skill" means create a nodeType:Skill node (Instructions
  and/or Action) under {user}/Skill (private) or {space}/Skill (shared). Lives in
  the shared base prompt so every conversational agent offers it; one-shot/utility
  agents never see a repeating user so it never fires.
- ChatCommands.md: document the proactive behavior + user-vs-space creation.
- Settings is admin-only territory (models + keys): remove the Agents settings tab
  and retire AgentsSettingsTab.cs. Agents/skills surface on the user overview's
  namespace listing (UserActivityLayoutAreas children: namespace:{user}) and are
  created from chat -- no settings tab, no parallel UI.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…rry-forward CircuitContext

Documents prominently the standing rule: in a node/thread/activity context the NODE OWNER
(resolved from the node) is the access context, injected EVERYWHERE and CARRIED FORWARD across
Rx hops via CircuitContext (Context alone is wiped on the hop). Genuine infra (doc sync, cache
hydration) runs as System; an EMPTY context is rejected instantly, never faked into hub-self.

New doc Data/Architecture/OwnerInjection.md (cross-linked from AccessContextPropagation.md) with
the cold-start submit deadlock as the worked example (owner-side data-source sync write posted a
NULL UpdateStreamRequest -> fail-closed -> pending never landed -> 30s timeout).

Code: ThreadExecution.SetThreadHubIdentity now stamps the thread owner as BOTH Context AND
CircuitContext (the carry-forward slot). This is the thread-hub step of owner injection; the
remaining cross-cutting piece is injecting the owner at the per-node data-source sync stream
(a separate sync sub-hub does not inherit the thread hub's CircuitContext) — tracked in the doc.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…e mesh (never materialised to disk)

Comment-only fix on SkillDefinition (XML doc) + the mirrored ChatCommands.md inline
comment: AutoMount means an instruction skill is ADVERTISED up-front to the CLI
harnesses + the MeshWeaver agent so it's discoverable without being asked -- read
from the mesh on demand, NEVER materialised to a shared skills directory on disk.
Corrects the stale "mounted/materialised to disk" wording to match the actual
behaviour. No logic change.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
…r resolver, not async hub identity

Sharpen OwnerInjection.md with the proven root cause. SYNCUPD probe on the data-source sync
stream (ds/…/history-cold-start, whose Host IS the thread hub) shows two writes: the first
(cold-start submit pending) has hostReal=null and posts a null AccessContext (fail-closed); the
second, ~200ms later, has hostReal=TestUser. SetThreadHubIdentity establishes the owner
ASYNCHRONOUSLY (GetMeshNode round-trip) and the first write loses that race.

Correction to the prior note: the Host is NOT wrong/empty — it is the right hub; only the async
timing loses. The fix is a SYNCHRONOUS owner resolver: the data-source stream resolves the owner
from the node already in its Current (CreatedBy) at write time, wired by the MeshNode-aware
data-source layer (generic SynchronizationStream<EntityStore> can't read CreatedBy itself). This
removes the race; no System fallback (would violate StreamUpdate_WithoutAsyncLocalIdentity_FailsClosed).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Cross-link the Owner Injection rule (node owner = standing access context, injected everywhere,
carried forward via CircuitContext; doc-sync/cache = System; empty context rejected) from every
relevant doc: Architecture index (Security row) + Glossary term; ThreadOperations (new 'thread
identity = owner' section); CqrsAndContentAccess (never-null/GetStream); ActivityOperations +
ActivityControlPlane (activity owner); RequestViaStreamUpdate (identity across the cross-hub hop —
the cold-start race lives exactly here). DocumentationLinkIntegrityTest green.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants